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ABSTRACT 

We describe a simple ansatz to approximate the low temperature behavior of proteins 
and peptides by a mean-field-like model which is analytically solvable. For a small pep- 
tide some thermodynamic quantities are calculated and compared with numerical results 
of an all-atoms simulation. Our approach can be used to determine the weights for a 
multicanonical simulation of the molecule under consideration. 
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Prediction of the three-dimensional structure of peptides and proteins solely from their 
amino-acid sequence remains one of the longstanding unsolved problems in bioscience. It 
is widely accepted that the three-dimensional shape at temperature T corresponds to the 
global minimum in free energy of the molecule. Therefore, given a sufficiently accurate 
description of the intramolecular forces, it should be in principle possible to predict such 
conformations through a numerical simulation. However, the complex form of the in- 
teractions containing both repulsive and attractive terms leads to a very rough energy 
landscape with a huge number of local minima separated by high energy barriers. These 
barriers slow down the ergodic exploration of conformation space at low temperatures. 
In the canonical ensemble at temperature T, the probability to cross an energy barrier of 
height AE is proportional to e _A£//fcsT . Hence, at low temperatures, canonical molecular 
dynamics and Monte Carlo, using local updates, will get trapped in one of these local 
minima. Within limited simulation time, only small parts of phase space are sampled and 
physical quantities cannot be calculated accurately. 

However, with the development of the multicanonical approach [Q, simulated tem- 
pering P| and other generalized-ensemble techniques, an efficient sampling of low energy 
configurations and an accurate calculation of thermodynamic quantities at low temper- 
atures became feasible, at least for small peptides. The first application of one such 
technique to the protein folding problem can be found in Ref. [|J. Later applications 
include the study of the coil-globular transitions of a model protein [[| and the helix-coil 
transitions of homo-oligomers of nonpolar amino acids p|. A numerical comparison of 
several generalized-ensemble algorithms can be found in Ref. ||. Nevertheless, calcula- 
tion of thermodynamic quantities by numerical simulations remains a time consuming 
process. The evaluation of energies which is necessary for each Monte Carlo or molecular 
dynamics step is itself CPU time intensive. Hence, it is desirable to find an approximate 
description of the thermodynamic properties of complicated molecules like peptides by 
using a much simpler effective model. The form of the effective model should retain the 
essential physical characteristics of the original system (number and nature of the degrees 
of freedom) and its parameters be adjusted to best reproduce the thermodynamic proper- 
ties of the original, especially at low temperature. The results from generalized ensemble 
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simulations now allow the development and testing of such an approach. 

Fixing bond lengths and bond angles (which within our range of temperatures can be 
considered as rigid), peptides and proteins can be described by a set of torsion angles. 
This is the motivation behind our choice to describe our effective model by % independent 
angles 9i,i = l,...,n, where 9{ 6 [— ir, 7r[. We assume that the potential energy of our 
system can be approximated by 

rip 

E*J2V(8), (1) 

where the "mean" potential V(8) is the same for each angle. The most general form for 
the periodic potential V is 

oo 

V(6) = a k cos k6 + b k sin k6 (2) 

fc=0 

Without loss of generality, we can impose that the global minimum of V occurs, 
for instance, at 9 = 0. Then the low-temperature behavior will be determined by the 
harmonic approximation 

V ha rmonic(9) = C + Ci(l - COS 9) . (3) 

To maintain the simplicity of our effective model, we truncate the expansion in Eq. (Q) to 

V{9) = c + ci(l - cosfl) + c 2 (l - cos#) 2 (4) 

with anharmonic term c 2 . The 3 parameters Cq,c\, and c 2 will be fitted to best match 
features of the original model. It is obvious that 

Co = ^ (5) 

Up 

where Eqs is the ground state energy of our molecule. 
The partition function of our effective model is simply 

Z = z nF with z= f +n d9e~ l3V(e) (6) 

Using the definition of the modified Bessel function I n (y) = - Jq dx cosnx e y COSIE , and 
the identity e y cosx = Io(y) + ^HkLi^kiy) coskx, the partition function per angle z can 
be expressed as 

3 

-3(c + ci + -c 2 ) / ~ 
z = 2vre 2 J (-/3c 2 /2)/ (J3(a + 2c 2 )) + 2 ^ I k (-(3c 2 /2)I 2k ((3( Cl + 2c 2 )) 

V fc=i / 

(7) 



The average energy < E > can be obtained through < E >= —d(logZ)/d(3, or 
directly as 



< E >- 

with 



3 c 2 
(c + C\ + -c 2 ) — (ci + 2c 2 ) < cosx > +— < cos2x > 



n 



<cosmx> = /(m)//(0) (9) 

f(m) = Jo(-/3c 2 /2)/ m (/3( Cl + 2c 2 )) (10) 

+ £ 4(-/3c 2 /2) [J 2fc+m (/3(d + 2c 2 )) + I 2k . m {P( Cl + 2c 2 ))] (11) 
fc=i 

and similarly for the specific heat which we define as 

C < T >-CT <£> (12) 

with Boltzmann constant kg and the number of residues. 

We expect Eq. [I], with the mean potential given by Eq. [|, to be a good approximation 
of our system for low temperatures. In principle the approximation can be systematically 
improved for higher temperatures by increasing the number of terms in the expansion of 
Eq. 0. However, this will lead to more complicated equations than for instance Eqs. |7| and 
H While periodicity of the variables 9 is not required for a low-temperature approxima- 
tion, this similarity with the original system improves the quality of the approximation 
at higher temperatures (e.g. the specific heat remains bounded). 

Our benchmark is one of the simplest peptides, Met-enkephalin, which has become 
an often used model to examine new algorithms in the protein folding problem. Met- 
enkephalin has the amino acid sequence Tyr-Gly-Gly-Phe-Met. For comparison and to 
fit the free parameters of our ansatz we use the results published in Ref. [0. They 
were derived from a multicanonical simulation of 200,000 sweeps using a potential energy 
function E to t given by the sum of electrostatic term Ec, Lennard- Jones term Elj, and 
hydrogen-bond term E^ for all pairs of atoms in the peptide together with the torsion 
term E tor s for all torsion angles. The parameters for the energy function were adopted 
from ECEPP/2 ||. By fixing the peptide bond angles u to 180°, 19 torsion angles were 
left as degrees of freedom (i.e. rip = 19). 



In our ansatz using Eq. [7] to approximate our molecule we have three free parameters 
which have to be fitted against numerical results. Since our approximation is only valid 
for low temperatures, we should perform the fit against results at the lowest reliable 
temperatures available. We expect canonical simulations of peptides and proteins to be 
computationally feasible down to the folding temperature Tp. A fit of our free parameters 
should therefore be done against thermodynamic quantities measured in the vicinity of 
this temperature. By using results from simulated annealing runs it may be possible to 
include results of even lower temperatures. In our case, however, we restricted ourselves 
to the temperature range 200 K to 240 K, since the folding temperature was found to be 
Tp w 220 K for Met-enkephalin ||. For our fit in this temperature range we use results 
for the average energy < E > and specific heat C(T) as obtained from a multicanonical 
simulation and published in Ref. ]/]]. We find Cq = —0.55 kcal/mol, c\ = 4.77 kcal/mol 
and C2 = —2.05 kcal/mol. These parameters were obtained in the following way. Since 
one shortcoming of our ansatz is to underestimate the peak in the specific heat, we first 
fixed the ratio oijc\ to the value —0.43 which gives the highest specific heat peak. Then 
we adjusted c\ to reproduce the specific heat in the chosen temperature range 200 K to 
240 K. Finally the remaining parameter cq was fixed by requiring that our fit reproduce 
the values for the average energy < E > in the chosen temperature range. For the case of 
a pure harmonic approximation, i.e. fixing C2 = 0, we were not able to obtain a fit which 
would reproduce < E > and C(T) in the temperature range 200 K to 240 K. The closest 
low-temperature match was obtained for a choice of parameters cq = —0.55 kcal/mol and 
C\ = 1.19 kcal/mol. We remark that npCo is an estimate for the ground state energy. 
Hence, the value of npco ~ —10.5 kcal/mol should be compared with the known true 
value of Eqs = —10.72 kcal/mol. 

Using the above values for q we were now able to calculate estimates for the average 
energy < E > and specific heat C(T) by our ansatz and compare the obtained results 
with those of a multicanonical simulation. In Fig. 1 we display < E > and in Fig. 2 the 
specific heat C(T) as obtained by our ansatz (with and without anharmonicity constant 
C2) and by a multicanonical simulation. As one can clearly see from our data, it is 
necessary to include anharmonic terms in our ansatz. A pure harmonic approximation 
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produces reasonable values for the average energy < E > only for temperatures below 
150 K and for the specific heat below T ~ 90 K. Including an anharmonicity constant 
C2 dramatically improves the ability of our ansatz to predict average energies. Values for 
this quantity can now be reproduced up to temperatures ~ 300 K, i.e. to temperatures 
where usual canonical Monte Carlo simulations are feasible. However, our approach is 
less successful for the specific heat, where the data from the multicanonical simulation 
can only be reproduced up to temperatures ~ 220 K, the folding temperature T F . Better 
agreement could be obtained with the addition of further anharmonic terms. 

We have demonstrated that for small peptides the low temperature behavior of energy 
and related quantities can be described by our simple ansatz. In addition, our ansatz 
can also be used to obtain estimators for the weights in generalized-ensemble simulations. 
Unlike for the canonical ensemble these weights are not a priori known, and a great 
fraction of the computing effort (about 1/2) goes into their determination prior to the 
simulation itself. As an example take the multicanonical algorithm. Here, the weights are 
defined by 

w mu {E)<xn-\E)=e- s ^ , (13) 

where n(E) is the spectral density and S(E) the microcanonical entropy. Obviously, 
knowledge of the weights in this ensemble is equivalent to that of the spectral density, 
i.e. to solving the system. Hence, the need for estimators. Usually these estimators are 
calculated by an iterative procedure, first described in Ref. [l|]. However, convergence 
of this method depends strongly on the model under consideration, and determining the 
weights can be a time consuming and difficult process. Several attempts have been made 
to speed up their calculation, see for instance Refs. [10, 11, E|, but there is still need 



for further improvement. Here, we propose to take as estimators for the multicanonical 
weights 

w mu (E)<xn- 1 (E)=e-^ , (14) 

where n(E) (S(E)) is the spectral density (microcanonical entropy) of our effective model. 
Note that once multicanonical weights are known, it is easy to calculate the weights for 
other generalized ensemble techniques as was shown in Ref. [13 . In principle, n(E) can 



be calculated directly from the partition function Eq. ^. However, in practice it is easier 
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to calculate the canonical entropy 

S(T) = <E > {t) +\ogZ(T) , (15) 
k B T 

and approximate S(E) ~ S(T[< E >]). The function < E > (T) can be calculated 
by our ansatz, which allows us in turn to obtain T(< E >). This approximation of the 
microcanonical entropy by the canonical one becomes exact for rip — > oo. 

With the above ansatz we were able to calculate new multicanonical weights for Met- 
enkephalin, for energies between —10.7 and kcal/mol. The upper limit comes from the 
fact that our approach gives a good approximation of the average energy only for T < 
300 K, where < E > (T = 300 K) « 0. For higher energies we set w(E) = c ■ e - B /(300-fc B ) 
where the constant c is chosen such that the weights are a continuous function of the 
energy at E — 0. In this way we also allow sampling of high energy states. However, 
it is clear that with so defined weights use of the re-weighting techniques is essentially 
restricted to temperatures T < 300 K. Re-weighting to higher temperatures would lead to 
expectation values hampered by increasing errorbars since the sampling of configuration 
becomes poorer and poorer with increasing energies for E > 0. 

We used for the calculation of multicanonical weights our full 3-parameter ansatz. 
since the harmonic approximation fails already at much lower temperatures (and hence 
energies). In Fig. 3 we show the distribution in energy obtained from a simulation with 
1,000,000 sweeps using these weights. The distribution covers the chosen energy range 
and is essentially flat in this range . The histogram entries for the energies differ by no 
more than a factor 2. This near-constancy demonstrates that the entropy Eq. |l| of our 
ansatz is a very good approximation of the exact microcanonical entropy, yielding good 
estimators for the multicanonical weights. The quality of these estimators and the range 
of energies could easily be enhanced by further iterations with the method described first 
in Ref. [l|]. However, since our simulation covers a broad range of energies, we can use 
standard reweighting techniques [[L4[ and obtain already with the given weights accurate 
thermodynamic quantities over a range of temperatures not accessible by canonical sim- 
ulations. As an example we show in Fig. 4 the average end-to-end distance < D > (T) 
(measured from N of Tyr 1 to O of Met 5) as a function of temperature as obtained from 
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the above multicanonical simulation with the new weights. This quantity is a simple mea- 
sure for the compactness of configurations. For comparison we also show the results from 
a multicanonical simulation with the same number of sweeps but with the more constant 
weights used in Ref. 0. As one can clearly see the two sets of weights lead to values of the 
end-to-end distance compatible with each other. Note that in the case of the new weights 
the errorbars increase rapidly with temperature for T > 300 K, reflecting the fact that for 
E > configurations are no longer sampled according to a uniform distribution but to a 
Boltzmann distribution at T = 300 K. On the other hand, the old set of weights yields a 
flat distribution for energies up to ~ 20 kcal/mol (which corresponds to T = lOOOif) and 
therefore the size of errorbars does not increase with temperature in the shown range of 
temperatures. 

For a study of the protein folding problem one would prefer proteins which are at 
least an order of magnitude larger than Met-enkephalin, the peptide we used in this 
work. It is difficult to predict how useful our ansatz is going to be in this case. On 
the one hand one expects a statistical, mean-field description to be even more accurate 
on a larger system. On the other hand, a larger protein will have a more pronounced 
folding transition, becoming first-order- like (for a discussion on the nature of the folding 
transition see, for instance, Ref. JTB|), and this collective phenomenon will be impossible 



to capture with our mean-field approach. Therefore we would expect our ansatz to work 
better below the folding temperature Tp, more poorly at or above it. Its efficiency for the 
determination of multicanonical weights remains to be determined. 

To summarize our results, we have introduced a simple mean-field-like model to de- 
scribe the low temperature behavior of peptides and proteins, which can be systematically 
improved by including higher Fourier components in the mean-field potential. For a small 
peptide we calculated some thermodynamic quantities and compared the results with the 
ones from an all-atoms simulation. In addition, our ansatz allows for peptides a sim- 
ple and efficient calculation of multicanonical weights alleviating in this way the main 
drawback of these techniques. 
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Figures: 

1. Average energy < E > as a function of temperature. Shown are the results obtained 
from our ansatz, for both c<i = (dot dashed line) and Ci 7^ (unbroken line). For 
comparison we also show the results obtained from a multicanonical run of 200,000 
sweeps (marked by +). 

2. Specific heat C(T) as a function of temperature. Shown are the results obtained 
from our ansatz, for both C2 = (dot dashed line) and C2 7^ (unbroken line). For 
comparison we also show the results obtained from a multicanonical run of 200,000 
sweeps (marked by +). 

3. Multicanonical distribution of energy, P(E) oc e -5 ^" 1 " 5 ^, obtained from a multi- 
canonical simulation of 1,000,000 sweeps with weights calculated from our ansatz. 

4. Average end-to-end-distance < D as a function of temperature. The data 
marked by o are from a multicanonical simulations of 1,000,000 sweeps with weights 
obtained by our new ansatz. For comparison we also show the results (marked by +) 
from a multicanonical simulation with same number of MC sweeps and the weights 
of Ref. which were obtained by the older iterative procedure. 
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